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ABSTRACT 

'Consideration  is  given  to  the  means  by  which  appropriate 
diagnostic  checking  functions  of  the  data  can  be  developed  to 
guard  against  feared  model  discrepancies.  A  formal  basis  for 
the  selection1 of  a  function  is  given  for  situations  where  the 
feared  inadequacy  can  be  cha racter i zed  by  a  discrepancy  para- 
meter  ^  which  takes  a  (possible  i  na  ppropr  i  a  t  e)  value  of 
in  the  model.  The  relationship  of  this  checking  function  with 
the  posterior  distribution  obtained  from  an  elaborated 
( "robusti f i ed" )  model  which  allows  for  the  discrepancy  parameter 
to  be  estimated  is  discussed.  The  nature  of  the  diagnostic  check 
is  briefly  described  for  problems  relating  to  transformation  of 
the  dependent  variable  and  to  serial  correlation;  while  a  more 
thorough  investigation  of  the  checking  function  is  given  for 
problems  relating  to  outlying  observations  and  to  transformation 
of  predictor  variables.  Several  examples  are  given  to  illustrate 
these  ideas.  ^ 
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*  it* 


SIGNIFICANCE  AND  EXPLANATION 


Statistical  methods  are  useful  as  tools  in  scientific  model¬ 
building  investigations.  In  particular,  at  each  stage  of  an  investi¬ 
gation  it  is  of  interest  to  not  only  estimate  the  parameters  of  the 
model  being  postulated  at  that  time,  but  to  also  check  the  fitted 
model  in  its  relation  to  the  data  with  the  intent  to  reveal  its  in- 
adequaci  es  ,  if  any. 

In  this  paper,  a  diagnostic  checking  function  is  developed  for 
the  latter  purpose  above.  This  diagnostic  check  is  useful  in  situa¬ 
tions  where  a  feared  model  inadequacy  can  be  characterized  by  a  so- 
called  discrepancy  parameter  3  which  may  have  a  true  value  differ¬ 
ent  from  the  value  6q  assumed  by  the  current  model.  The  relation¬ 
ship  of  using  this  checking  function  to  using  a  broader  model  which 
allows  3  to  be  estimated  (rather  than  assuming  that  3  =  6g)  is 
ex  pi ored . 

Examples  are  discussed  in  which  3  measures  the  need  to  allow 
for  outliers,  the  need  for  data  transformation,  and  the  need  to 
allow  for  serial  correlation  of  errors. 


The  responsibility  'or  the  wording  and  views  expressed  in  this 
oescriptive  summary  lies  with  MRC,  and  not  with  the  authors  of 
this  report. _ 


THE  DUALITY  OF  DIAGNOSTIC  CHECKING  AMD  ROBUST  I F  I  CAT  I  ON 
IN  MODEL  BUILDING:  SOME  CONSIDERATIONS  AND  EXAMPLES 

Steven  P.  Bailey  and  George  E.  P.  Box 


1 .  Introduction. 

"The  advancement  of  statistical  methods  to  the  present 
state  has  depended  critically  upon  the  interaction  of  mathe¬ 
matics  with  real  data,  and  we  cannot  but  benefit  If  more 
attention  is  given  to  the  characteristics  of  the  real  world 
than  has  been  done  in  recent  decades."  This  view,  expressed 
by  Stlgler  (1977),  is  characteristic  of  the  recent  renewal 
of  interest  in  finding  out  what  the  real  world  is  really 
like.  (See  also  Box,  197 9d . ) 

In  this  spirit,  Chen  and  Box  (1979a, b)  recently  con¬ 
ducted  a  study  of  real  data.  In  their  investigation  of  nine 
data  sets,  they  used  the  contaminated  exponential  power  dis¬ 
tribution  in  modeling  the  experimental  errors,  and  the  results 
suggest,  in  their  words,  that  "heavy  tailed  distributions  are 
sometimes  caused  by  inhomogeneity  in  mean  and  variance  such 
as  might  be  encountered  early  in  an  experiment  because  of 
start-up  difficulties.  After  such  inhomogeneity  has  been 
allowed  for,  the  observations  seem  to  be  adequately  repre¬ 
sented  by  a  contaminated  normal  distribution.  Thus,  for  a 
carefully  planned  experiment,  a  contaminated  normal  distri¬ 
bution  is  likely  to  be  appropriate." 

Thus  a  particular  combination  of  robustif ication  and 

diagnostic  checking  techniques  are  being  recommended  as 
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appropriate  for  modeling  data  from  a  well-planned  investi¬ 
gation.  Specifically,  the  investigator  who  wishes  to  imple¬ 
ment  the  above  suggestions  can  do  so  by  making  provisions  in 
the  model  for  the  possibility  of  discrepant  observations  (and 
thereby  robustifying  against  outliers)  while  tentatively  ex¬ 
cluding  from  the  model  additional  provisions  for  the  pos¬ 
sibility  of  non-normal  kurtosis  (and  thereby  being  content 
to  later  perform  a  diagnostic  check  to  assess  whether  or 
not  this  exclusion  is  warranted). 

Of  course,  the  experienced  investigator  will  be  aware 
of  the  fact  that  there  are  numerous  other  potential  problems 
besides  the  two  mentioned  above  which  need  to  be  considered 
when  deciding  upon  what  to  include  in  a  model  versus  what 
to  tentatively  omit  from  a  model.  Of  particular  importance 
are  questions  concerning  the  possibility  of  serial  cor¬ 
relation  and  the  need  to  transform  the  response  and/or 
predictor  variables. 

Since  it  is  impossible  to  attempt  to  provide  for  all 
possible  contingencies  when  choosing  a  model,  it  is 
important  for  the  investigator  to  have  available  appropriate 
diagnostic  checking  procedures  which  are  capable  of  detecting 
when  deficiencies,  such  as  the  ones  mentioned  above,  exist  in 
the  model.  A  general  method  for  developing  such  procedures 
will  now  be  considered. 


2. 


A  formal  basis  for  the  selection  of  a  diagnostic 
checking  function"! 

As  aiscussed  by  Box  (1979b),  criticism  of  a  tentative 
model  M  in  the  light  of  the  observed  data  y  ^  is  often 
done  informally  (for  example,  by  examining  residuals), 
and  yet  has  an  underlying  formal  justification  in  that 
any  peculiar  aspect  of  yd  which  may  cause  the  investigator 
to  doubt  the  adequacy  of  M  will  correspond  to  an  appropriate 
summary  measure  g ( y  ^ )  of  that  aspect  which  is  judged 
unusual  when  assessed  against  its  predictive  reference 
distribution,  p ( g ( y ) | M ) . 

A  formal  basis  for  the  selection  of  a  checking  function 
g(y)  has  recently  been  suggested  by  Box  (1979c)  in  the  con¬ 
text  of  considering  any  particular  model  departure  that  can 
be  represented  in  terms  of  a  so-called  discrepancy  parameter 
B  .  Letting  8Q  denote  the  assumed  value  of  8  for  the 
model  M  ,  this  proposed  checking  function  is  given  by 


77  in 


(1) 


where  p(y|B)  denotes  the  conditional  predictive  distribution 
for  a  given  choice  of  B  (and  hence  p(y|SQ)  is  the  predic¬ 
tive  distribution  appropriate  for  the  model  M  ). 

One  justification  for  the  appropriateness  of  (1)  as 
a  diagnostic  checking  function  can  be  developed  by  noting 
the  relationship  between  the  conditional  predictive 


distribution  p(y|g)  and  the  pseudo-likelihood  function 
P  (8|y)  •  As  used  by  Box  (1979a;  see  also  Bailey  and  Box,  19- 

the  pseudo-likelihood  arises  when,  instead  of  setting  8 
at  a  specified  value  0g  for  the  model,  the  discrepancy 
parameter  is  assessed  a  prior  distribution  p(8)  and  esti¬ 
mated  along  with  the  other  model  parameters.  Defined  by 

p(S|y) 

p£(s|y)  =  — — .  (2) 

p(e) 

the  pseudo-likelihood  differs  for  differing  choices  of  p(s) 
only  by  a  multiplicative  constant  which  depends  on  y  ,  and 
hence  it  represents  information  about  6  contained  in  the 
data.  It  is  convenient  to  consider  for  definiteness  a 
standardized  form  of  (2)  ;  namely,  the  posterior  distribu¬ 
tion  pu(s|y)  which  results  from  using  a  uniform  prior 
pu ( 0 )  .  For  this  choice  it  then  follows  from 


p(8|y)  «  p (y | 0 ) p ( e ) 


(3) 


that 


Pu(8|y)  «  p(y|s) 


(4) 


Hence  (l)  can  be  rewritten 
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Is  ln  Pyfely) 


g6(y) 


(  5) 


so  that  this  predictive  check  is  seen  to  be  measuring  an 
Intuitively  Interesting  quantity;  namely,  the  Instantaneous 
rate  ’of  change  of  the  logarithm  of  the  pseudo-likelihood 
(or  "log  pseudo-likelihood",  for  short)  at  the  "ideal" 
value  6q  . 

The  Interpretation  of  the  predictive  check  (1)  will 
depend  upon  the  nature  of  the  discrepancy  parameter  B 
under  study.  Two  different  situations  can  occur. 

First,  consider  the  case  where  it  makes  sense  to  talk 
about  possible  3  values  over  a  wide  range,  with  the  "ideal" 
value  Bq  somewhere  In  the  middle  of  this  range.  Thus,  if 
the  actual  data  observed,  ,  give  rise  to  a  value 
g&(yd)  which  Is  small  In  absolute  value,  then  the  appropri¬ 
ateness  of  an  assumption  B  *  Bq  Is  not  seriously  questioned 
However,  an  unusually  large  value  of  | g  ) |  ,  as  assessed 
by  the  predictive  distribution  p( g& ) | B0)  »  will  cast 
doubt  on  the  validity  of  the  B  a  Bq  model  assumption. 
Specifically,  a  large  positive  value  of  will  suggest 

that  values  of  B  that  are  larger  than  Bq  may  be  more 
reasonable  to  consider,  whereas  a  large  negative  value  of 
9g(yd)  will  suggest  that  smaller  values  of  B  than  the 
ideal  value  Bq  may  be  more  appropriate.  Hence,  the  use  of 
this  predictive  check  corresponds,  in  a  sampling  theory 


framework,  to  a  hypothesis  test  of  0  =  0q  against  a  two- 
sided  alternative.  Several  examples  will  be  given  later  in 
this  section. 

Now,  consider  the  case'*  where  it  only  makes  sense  to 

talk  about  0  values  for  which  0  >  0„  .  Here,  the  pre- 

-  0 

dictive  check  (i)  can  be  viewed  as  a  sampling  theory  hypo¬ 
thesis  test  of  0  =  0g  against  the  one-sided  alternative 
corresponding  to  larger  values  of  0  .  (An  example  relevant 
to  this  case  will  be  presented  in  Section  3.) 
smaller  values  of  g  (y)  will  lend  more  support  to  a  model 

P  - 

assumption  of  0  *  0Q  than  will  larger  values  of  g^(y)  • 

Of  course  a  convenient  reference  value  to  compare  9g(y) 
to  in  this  case  is  zero  since,  from  the  earlier  argument, 
a  value  g3(yd)  >  0  will  indicate  that  the  log  pseudo¬ 
likelihood  will  initially  increase  as  0  is  allowed  to 
increase  from  0q.  while  a  value  gg(y£j)  <  0  will  indicate 
an  initial  decrease  in  P u ( S I y )  when  0  is  increased 
from  0q  . 

Some  applications  of  the  proposed  checking  function. 

Box  (1979c)  illustrates  the  use  of  the  checking 
function  (i)  by  applying  it  in  two  particular  situations 
which  we  now  briefly  discuss. 

*An  analogous  argument  can,  of  course,  be  given  for 
the  case  where  S  £  0g  by  redefining  the  discrepancy  para¬ 
meter  to  be  the  negative  of  what  it  presently  is  and  then 
using  the  above  development. 


Suppose  that  the  investigator  tentatively  fits,  to  the 
n-dimensional  vector  y  =  (y],...,yn)'  of  untransformed 
observations,  the  normal  linear  model  Nn(X0,o2I)  ,  where 
0  is  p-dimensional .  However,  suppose  further  that  he  wishes 
to  perform  a  diagnostic  check  which  would  indicate  whether  or 
not  it  would  be  more  appropriate  to  fit  the  same  model  to 
some  suitable  transformation  of  the  observations .  In  particu¬ 
lar,  if  the  family  of  transformations 

X 

yu  "  ' 

— -  ,  X  f  o 

X  u  s  1 , . . .  ,n  (6) 

In  y  ,  X  i  0 
u 

discussed  by  Box  and  Cox  (1964)  is  considered,  then  a  diag¬ 
nostic  checking  function  Q\[%)  of  the  form  (1)  can  be 
used,  where  Xg  =  l  is  the  value  of  X  for  the  tentative 
model . 

For  this  situation,  Box  (1979c)  derives  the  conditional 
predictive  distribution 


where  y  is  the  geometric  mean  of  the  untransformed 


observations.  Upon  taking  the  logarithm  of  (7)  ,  different!’ 
atlng  with  respect  to  X  ,  and  evaluating  this  derivative  at 
Xg  =  1  ,  he  finds  the  resulting  diagnostic  checking  function 
to  be 


Y'R  y 

where 

vs2  *  y ' R  y  , 


Y 

u 


y  [1  -  tn(^)] 
u  y 


U 


> 

1  1 1  ■  *  i  n  . 


(8) 


(9) 


Thus,  the  predictive  check  (8)  seeks  a  correlation 
between  the  residuals  from  fitting  y  to  Nn(X0,a2i) 
and  the  residuals  from  fitting  Y  to  Nn(X0,<J2I)  , 
since  the  two  residual  vectors  are  given  by  Ry  and  RY  , 
respectively.  If  a  strong  correlation  is  found,  then  this 
indicates  the  need  for  transformation.  Now,  by 
recalling  the  earlier  discussion  pertaining  to  the  general 
predictive  check  g^(^)  in  Its  form  (5)  ,  it  is  seen  that 

a  strong  positive  correlation  between  these  two  sets  of  re¬ 
siduals  will  Indicate  that  values  of  X  >  XQ  »  l  are  worthy 
of  consideration;  whereas  a  strong  negative  correlation 
between  the  two  residual  sets  Is  seen  to  Indicate  that 
values  of  X  <  XQ  ■  l  may  be  more  appropriate.  An  example 
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Illustrating  the  application  of  this  diagnostic  check  will 
be  given  in  Section  4. 

Note  that  this  predictive  check  can  be  conducted  in 
informal  manner  by  simply  plotting  the  two  sets  of  residuals 
against  each  other.  Note  also  the  relationship  of  this  check 
to  the  somewhat  similar  checks  proposed  by  Tukey  (1949)  and 

by  Andrews  (1971).  Specifically,  by  defining  Zu^  =  y2 

( A )  * 

and  Zy  =  yu  fcn  yu  ,  where  the  yu  's  are  the  predicted 
values  from  a  fit  of  y  to  Nn(X0,a2I)  ,  it  is  then  seen 
that  Tukey's  transformable  nonadditivity  check  seeks  a  cor¬ 
relation  between  the  residuals  from  a  fit  of  the  yy  's  to 
the  model  and  the  residuals  from  a  fit  of  the  Z  ^  's  to 
the  same  model.  Similarly,  Andrews'  check  is  concerned  with 
the  correlation  between  the  residuals  from  fitting  the  yu  's 
and  those  from  fitting  the  Zu^  's  . 

We  now  turn  to  a  second  illustration  of  the  applica¬ 
bility  of  the  general  predictive  check  (1).  Box  (1979c) 
considers  the  situation  in  which  the  Investigator  tentatively 
assumes  the  independence  of  the  yu  's  but  wishes  to  per¬ 
form  a  diagnostic  check  relative  to  the  possible  presence 
of  first-order  autoregressive  behavior  in  the  error  structure. 
(See,  for  example,  Pallesen,  1978.) 

We  shall  not  go  through  the  details  of  deriving  this  result 
here.  It  turns  out  that  the  diagnostic  checking  function  (1). 
is  a  multiple  of  the  lag  one  sample  autocorrelation  coefficient. 


-o- 


upon  which  the  familiar  test  of  Durbin,  and  Watson  (1950, 

1951)  is  based.  As  was  the  case  with  the  predictive  check 
corresponding  to  the  need  for  transformation,  this  check  for 
serial  correlation  can,  if  preferred,  be  performed  in  an 
informal  manner  through  residual  plotting  (in  this  case, 
by  plotting  the  u^  residual  versus  the  (u-l)st  residual 
for  u  s  2  , .  .  .  ,  n  ) . 

3.  diagnostic  check  for  discrepant  observations. 

We  now  illustrate  the  nature  of  the  checking  function 
(1)  for  the  situation  described  by  Box  and  Tiao  (1968),  where 
each  observation  has  a  probability  .  of  being  discrepant  in 
some  specific  manner.  The  predictive  distribution  conditional 
on  a  specific  choice  of  is  expressed  in  terms  of  the  n+1 

predictive  distributions  conditional  on  specific  choices  of  the 
number  of  outliers  r  as 

p(y  )  E  (p)ctr(l-a)n"rp(yjr) 

r=0  r 

(Bailey  ana  box,  i y . ,  0  )  ,  so  that 

"  ar(!-c)n-r]("}p(y|r) 

3  r=0  3a  r  ~ 

£n  p(y | a)  =  - - 

3a  ~  ^ 

I  (")  ar( 1 -a) n"rp(y | r) 
r  =  0  r 


n-1 


-n(l-a)n"1p(y!r=0)+{  Z 


r=l 


=j)}+nan  ^(ylr^n) 


8  (”)ar(l-ot)n*rp(y  r) 
r=0 


(10) 
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,  ■  •  lirjfrj irnitftfii 


'u  Ji 


Thus,  the  checking  function  g  (y)  ,  corresponding  to 
departures  from  the  ideal  situation  where  Oq  *  0  ,  becomes 


flaCjf) 


-n  p(y|r=0)+n  p(y|r»l) 
P (y I r=0) 


P(y! r*l ) 
p(yl  r«0) 


1 


with  hj 


p(jfl  r-j) 

1  "  • 
p(yl r=0) 


*  n  ( h  1  - 1 )  , 

(ID 


As  a  first  application  of  this  checking  function,  con¬ 
sider  the  analysis  of  Darwin's  data  presented  in  Bailey  and  Box 
(1980),  which  uses  a  "mistaken  sign"  model  for  the  bad  observa¬ 
tions.  Using  the  value  h^  =  2.10  from  Table  5  of  that  paper 
results  in  g  (y)  =  15(2.10  -  1)  =  16.5.  Recalling  again  the 
interpretation  of  this  diagnostic  check  that  is  suggested  by  (5), 
this  value  thus  indicates  that  n  P u (  '  1 y )  will  have  an  initial 
slope  of  16.5  at  =  Q  =  0  (inspection  of  p  (  .jy)  as  given  in 
Figure  10a  of  that  paper  bears  this  out)  and  hence  indicates  the 
relative  implausibility  of  an  assumption  that  ,  =  0  (that  is,  an 
assumption  that,  with  probability  one,  none  of  the  15  observations 
has  an  incorrect  sign). 

Now,  r  =  0  and  r  =  1  are  the  only  cases  which  enter 


into  the  calculation  of  g  (y).  Tor  convenience,  we  introduce 
the  following  notation.  Let  the  subscript  "i"  denote  those 
quantities  which  pertain  to  an  assumption  that 


the  1th  observation  Is  bad  and  all  other  observations  are 
good.  Then,  for  example,  z^  would  be  a  vector  of  zeroes,  ex¬ 
cept  for  the  1tfl  element,  which  would  equal  one  and  would  thus 
single  out  the  1th  observation  as  being  discrepant.  Also,  let 
1*0  correspond  to  the  assumption  that  no  observations  are 
discrepant. 

Returning,  then,  to  the  consideration  of  g  (y),  note  that 

**  QL  ^ 


p(y|r*l)  -  J-  ^p(y|zi)» 


so  that  the  predictive  check  (11)  can  be  rewritten 


Sa<*>  '  ’ 


where 


Si,a</> 


p(y I £i ) 


Thus  (13)  expresses  the  checking  function  9a(y)  as  a  sum¬ 
mation  of  n  individual  checking  functions,  the  1th  of 
which  compares  the  predictive  ratio  of  the  hypotheses  "only 
y1  discrepant"  and  "no  observations  discrepant"  to  unity. 

It  Is  of  Interest  to  consider  the  nature  of  this  diag¬ 
nostic  check  for  the  situation  described  in  Bailey  and  Box  (1930 
where  P<y I  6 ,a2 ,z1 )  Is  NR( Xi e  )  for  1  *0,1,...,n.  bsii 
their  results, it  follows  that  (14)  reduces  to 


r  Uiiixrv'v  .•t,*!-'  . 

i  - ; i — r-J  -  1  » 

I  I  r  ;'v  ir-lv  ■  e  1 

-  ’-0  '0  “0  SC  J 


THIS  PA03  IS  BSSI  QUALITY  EftiflffLCAM* 


where 


p  =  length  of  parameter  vector  0, 

v  =  n-p,  ^ 

e,  ■  (xj,zj-1x1)-1x1,z)-1y,  f 

vs?  '  (y-x1ei)'!:t"1(y-x191). 

In  particular.  If  It  is  assumed  that  a  bad  observation  is 
one  having  a  standard  deviation  which  is  k  times  as  large  as  the 
common  standard  deviation  of  all  good  observations  (see,  for 
example.  Box  and  Tiao,  1968),  then  (15)  becomes 


9 


1  v 


1 

1  X6X0“<*)5i^i ' 

"2 

!  +  lL  r  2 

k 

l*o'xol 

1  +  v-1  ri 
_  «■, 

(  17) 


where 

Xi  =  X0  for  1  =  1 »  ••  • »  n 
xj  *  1th  row  of  Xq, 

4)  *  1 - Ij-  , 

K 

C,  *  (;v-)»O-*x,'(X0'X0)-';(), 
a  s. 


>  (  18) 
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For  the  special  case,  with  p=l ,  where  8  Is  a  location 


parameter,  and  thus  Xq=1  ,  it  follows  that  (17)  reduces  to 

1 

<19> 

where 

n-1 

Ri  =  (1  +7i^Tn?  ri2)"r"  •  (20) 

Hence,  from  (15)  ,  the  predictive  checking  function  is 

1 

«.WstOr»-n'  ‘2’> 

where 

n 

R  *  Z  R  • .  (22) 

i*l  1 

Note  that  the  Ri 's  and  R  quantities  considered  by  Chen  and  Box  (1979a) 
in  their  investigation  of  the  weighting  structure  of  the  posterior 
mean  of  a  location  parameter  for  a  contaminated  normal  population. 
Specifically,  each  R..  is  proportional  to  the  inverse  of  the  t-ordinate 
corresponding  to  the  standardized  residual  r..  of  the  "bad"  observation, 
with  ri  defined  in  (18);  so  that  if  ri  is  large,  due  to  the  ith 
observation  truly  being  discrepant,  then  R.  and  thus  R  will  also  be 
large,  yielding  an  unusually  large  value  of  the  predictive  checking 
function  g  (y),  as  given  by  (21). 

We  now  turn  to  a  specific  example  which  illustrates  the  practical 
use  of  the  diagnostic  checks  discussed  in  these  last  two  sections. 


John  (1978)  gives  data  from  a  2s  factorial  experiment  designed 


to  study  the  effects  of  five  factors  on  the  strength  of  a  type  of 
metal  coating  material,  with  abrasion  loss  as  the  measured  response. 
The  experiment  was  run  in  two  blocks,  confounding  the  highest  order 
interaction  with  the  operator  effect,  as  two  different  workers  applied 
the  coating. 

The  experimental  design  is  set  out  in  Table  la  ,  where  x^-Xg 
and  Xg  denote,  respectively,  the  five  factors  under  study  and  the 
blocking  factor,  and  where  for  each  xi  the  coding  is  such  that  the 
two  factor  levels  used  in  the  experiment  correspond  to  x^-l  and 
x-=l.  The  data  generated  from  this  design  are  given  in  Table  lb, 
along  with  the  corresponding  predicted  values  and  residuals  from 
fitting  the  data  to  the  model 


*u s  vAviu*  *  = 


i=l 


i=l  j=i+l 


9ijxiuxju+96x6u+eu; 


u=l , 


32,  (  23) 


t  h 

where  x^  denotes  the  value  of  x.  corresponding  to  the  u  observation. 

These  residuals  and  predicted  values  are  plotted  against  each 
other  in  Figure  la  ,  and  inspection  of  this  plot  shows  two  possibly 
deviant  observations  Cy-j q  and  yg).  John  performs  a  significance  test 
for  two  outliers,  based  upon  the  work  of  John  and  Draper  (1978),  and 
finds  "no  evidence  of  any  outliers  being  present".  However,  he  tempers 
this  conclusion  by  noting  that  a  similar  significance  test  for  just 
one  outlier  does  give  indication  that  y^g  is  an  outlier. 
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FIGURE  1 


RESIDUAL  PLOTS  FOR  JOHN'S  2 5  DATA  (UNTRANSFORMED) 
USING  M2. 
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FIG.  IB: 

VS.  RESIDUALS  FROM 
FITTING  Y^’S  to  M^; 
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When  applying  the  predictive  check  an  to  this  example, 
using  k=5,  it  is  found  that  h^=1.90  (and,  incidentally,  h2s2.03), 
so  that  go(y)  =  32[1.90-1]  =  28.8.  Note,  in  particular,  that  the 
value  of  this  checking  function  is  positive,  which  indicates  that 
the  log  pseudo-likelihood  for  a  will  initially  increase  as  a  is 
allowed  to  increase  from  oiq=0. 

Thus  this  diagnostic  check  is  giving  some  indication  that 
the  basic  model  should  be  elaborated  to  allow  for  the  possibility 
of  outlying  observations.  This  can  be  done  using  the  methodology 
developed  by  Bailey  and  Box  (1930),  which  entails  specifying  a  prior 
distribution  for  frequency  a  in  which  discrepant  observations  occur.  However 
for  present  purposes,  it  will  be  sufficient  to  consider  a  sensitivity 
analysis  approach  (Box  and  Tiao,  1968),  whereby  (i)  an  analysis  of 
the  data  is  performed  for  each  of  a  number  of  different  fixed  values 
for  a  and  (ii)  these  analyses  are  compared  to  assess  how  inferences 
are  affected  by  the  choice  of  a. 

Thus,  consider  the  model  (23)  ,  where  now  the  errors  eu  are 
assumed  to  come  from  a  contaminated  normal  distribution  for  which  (i) 
the  nature  of  the  contamination  is  an  inflation  of  the  variance  by 
a  factor  of  k2=25  and  (ii)  the  frequency  of  the  contamination  is  given, 
in  percentages,  by  100a».  The  posterior  means  and  posterior  standard 
deviations  of  the  parameters  in  (23)  ,  based  upon  the  assumption  just 
given,  are  exhibited  in  Table  2  for  various  choices  of  a.  Of  course. 


TABLE  2 


POSTERIOR  MEANS  AND  STD.  DEVIATIONS  (IN  PARENTHESES) 
FOR  0's  OF  M2  FOR  JOHN'S  25  DATA  (UNTRANSFORMED) 
FOR  VARIOUS  VALUES  OF  o  WITH  k  -  5 . 


a 

0 

.001 

.005 

.01 

.025 

.05 

.10 

e 

4.J6 

4.36 

4.32 

4.31 

4  •  2  £ 

4.26 

4.25 

0 

( 

.31) 

l 

.31 ) 

< 

.30) 

( 

.30) 

( 

.20) 

(  .28) 

( 

.27) 

a 

-.S'? 

-.90 

-.92 

-.95 

-1.C0 

-1.03 

-1.05 

h 

( 

.31) 

( 

.31 ) 

( 

.30) 

< 

.  30) 

( 

.28) 

(  .27) 

< 

.26) 

.4* 

.46 

.49 

.51 

.54 

.56 

.57 

92 

{ 

.31) 

( 

.31  > 

( 

.10) 

( 

.30) 

( 

.29) 

(  .28) 

t 

.27) 

9 

-.71 

-.70 

—  .68 

-.66 

-.62 

-.60 

-.59 

3 

( 

.3D 

< 

.31) 

( 

.30) 

( 

.20) 

( 

.29) 

i  .28 ) 

( 

.27) 

a 

2.44 

2.65 

2 . 6  3 

2.61 

2.57 

2.55 

2.54 

04 

( 

.31) 

< 

.31) 

( 

.30) 

( 

•  20) 

{ 

.29) 

(  .28  > 

( 

.27) 

0, 

.27 

.20 

•  30 

.3? 

.35 

.37 

.29 
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.11) 

< 

.31  ) 
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.30) 

< 

.30) 
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.29)  . 

(  .28) 

t 

•  2  7  ) 
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-•44 

-.<4 
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—  50 
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9 12 
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.31) 
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.26) 

6 

.14 

.16  ■ 
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13 
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.31) 

( 

.31  > 
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( 

.30) 

( 

.28  ) 
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( 

.26) 
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- .  i<- 
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-.30 

14 

< 

.31) 

< 

.31) 

( 

.30) 

< 

.20) 

( 
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<  .27) 
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.26  i 
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-.01 

15 
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.31) 
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t 
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23 
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( 
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( 

.29) 
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9 
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-.48 
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( 
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35 

< 
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( 
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( 

.29) 
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( 

.27) 

A 

.24 
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945 

< 

.31) 

< 

.31) 

< 
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< 
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( 
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( 
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0, 
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6 

< 
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< 
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( 
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1> 


the  ot=*0  column  corresponds  to  an  assumption  that  there  are  no 
discrepant  observations.  The  other  choices  of  a  considered  were 
a  *  .001,  .005,  .01,  .025,  .05  and  .10.* 

A  study  of  how  the  posterior  mean  change  as  a  increases 
shows  that  there  is  no  dramatic  shift  for  any  individual  parameter. 
However,  when  considered  as  a  whole,  these  posterior  means  for  the 
elements  of  0  are  significantly  affected  by  a  change  in  a.  This 
is  evident  through  a  comparison  of  the  posterior  means  using  o=0 
with  those  using  a*. 10,  since  it  is  seen  that  al_l_  p=*l 7  posterior 
means  differ  in  these  two  cases  by  at  least  one  unit  in  the  first 
decimal  place. 

Table  3a  gives  posterior  probabilities  of  each  yu  being 
discrepant  conditional  on  a  fixed  number  of  observations  r  being 
discrepant.  (Hence,  these  probabilities  do  no  depend  on  the  choice 
of  a.)  For  r=l  discrepant  observation,  y-jg  is  seen  to  be  approxi¬ 
mately  seven  times  as  likely  to  be  the  bad  observation  as  is  yg. 
Furthermore,  for  r=2,  yg  is  approximately  three  times  as  likely  to 
be  one  of  the  30  good  observations  as  it  is  to  be  one  of  the  two 
bad  observations,  while  y^g  Is  approximately  five  times  as  likely  to 
be  one  of  the  bad  values  as  It  is  to  be  one  of  the  good  values. 

♦In  carrying  out  these  analyses  as  well  as  similar  analyses  later  in 
this  chapter,  it  was  further  assumed  that  no  more  than  two  outliers 
were  present  in  the  data.  In  view  of  the  values  of  a  considered, 
this  assumption  is  not  unreasonable. 
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TABLE  3 


CONDITIONAL  POSTERIOR  PROBABILITY  THAT  yu  IS 
DISCREPANT  (I.E. ,  HAS  VARIANCE  INFLATED  BY  A  FACTOR 
k2  -  25)  GIVEN  THAT  EXACTLY  r  OBSERVATIONS  ARE 
DISCREPANT:  JOHN'S  25  DATA,  USING  M2 . 

^Ii  *  Pr<yubadly»  *  -  i) 

n 

qu|2  *  ** r  »  2)  -  l  Pr(yu  and  y^badjy,  r  «  2) 

V5*U 


(a)  Untransformed  (A  =  l)  (b)  Transformed  (A  =  0) 
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These  findings  seem  to  indicate  that  of  these  two  observations, 
which  are  candidates  for  being  discrepant  based  on  the  residual 
plot  in  Figure  la  ,  only  y^  is  indicated  as  truly  being  discrepant. 

Whereas  some  investigators  might  be  content  with  the  above 
analysis  of  the  data,  as  provided  by  the  elaborated  model  which 
takes  into  account  the  possibility  of  bad  data  values,  other  inves¬ 
tigators  might  wonder  if  there  were  alternate  modeling  options  that 
should  also  be  investigated.  In  particular,  by  noting  that 
ymax^ymin  "  6/1.0,  the  potential  for  an  improved  explanation  of  the 

data  through  transformation  of  the  response  variable  y  is  indicated. 

Thus,  consider  again  the  model  (23)  with  the  standard 
assumptions  concerning  the  eu'$.  The  diagnostic  checking  function 
(8)  can  be  used  to  assess  the  need  for  transforming  the  yu'$. 
Informally,  this  involves  observing  the  correlation  between  the 
residuals  from  fitting  the  yu's  to  the  model  and  the  residuals 
from  fitting  the  Y^'s  to  the  model,  with  Yu  defined  by  (9).  These 
two  sets  of  residuals  are  given  in  Tables  lb  and  lc  ,  respectively, 
and  they  are  plotted  against  each  other  in  Figure  lb  .  Strong 
negative  correlation  is  noticed  (specifically,  the  calculated  cor¬ 
relation  is  -.789),  thus  indicating  the  desirability  to  consider 
transformations  of  the  form  Yu^>  given  in  (6)  >  with  X<Xq=1. 

It  will  therefore  be  beneficial  to  elaborate  the  basic 
model  to  provide  for  this  needed  transformation.  This  can  be  done 
using  the  method  of  Box  and  Cox  (1964).  (See  also  Box  and  Tiao,  1973). 


Of  particular  interest  will  be  the  posterior  distribution  of  X  based 
on  a  locally  uniform  prior  for  X.  (Equivalently,  this  will  be  the 
pseudo-likelihood  function  of  X.)  Figure  2  shows  the  posterior 
distribution  of  X  not  only  for  the  model  (23)  (hereafter  referred 
to  as  M2)  with  yu^  substituted  for  yu»  of  course,  but  also  for  a 
constrained  model  M-|  which  is  strictly  linear  in  the  6  input  factors 
x-|-Xg  (i.e.,  M-j  excludes  from  Mg  the  ten  terms  corresponding  to  all 
two  factor  interactions  of  factors  1-5).  It  is  seen  that  both 
Pu(X|y,M2)  and  pu(X|y,M^)  strongly  support  the  use  of  a  logarithmic 
transformation  (X=0). 

The  constrained  model  M^  is  considered  in  addition  to  the 
original  model  Mg  because  it  is  often  the  case  that  a  suitable 
transformation  will  not  only  result  in  the  assumptions  on  the  cu's 
being  more  nearly  satisfied  but  also  will  result  in  a  simpler  form 
of  the  response  function  being  appropriate.  The  specific  assessment 
of  this  aspect  of  transformation  can  be  achieved  through  the  consider¬ 
ation  of  Figure  3  ,  which  is  a  plot*  of  F(x)  versus  X,  where  F(x)  is 
the  F  statistic  appropriate  for  testing  whether  the  10  interaction 
terms  of  Mg  (that  are  not  in  M^)  can  be  excluded  in  fitting  the 
transformed  y  ^'s.  [This  statistic,  which  has  as  a  predictive 
distribution  the  F  distribution  with  10  and  15  degrees  of  freedom 
when  M-|  is  true,  is  obtained  from  the  analysis  of  variance  table  in 

♦There  are  additional  plots  that  the  investigator  may  wish  to  look 
at  when  considering  the  question  of  transformation;  see,  for  example 
Draper  and  Hunter  (1969). 


the  usual  way  as  a  ratio  of  mean  squares.  As  an  illustration, 
the  analysis  of  variance  tables  are  given  in  Tables  4a  and  4b 
for  the  cases  X=1  (no  transformation)  and  X=Q  (log  transformation), 
respectively,  from  which  it  is  seen  that  F(1 )=1 . 58  and  F(0)*.54. 

The  message  to  be  gleaned  from  Figure  3  is  that  although 
there  is  no  value  of  X  (at  least  in  the  range  -2<X<2)  for  which 
the  first-order  model  M-j  is  judged  inappropriate,  all  values  of 
X  in  the  range  -2<X<0  lead  to  a  strikingly  better  first-order  fit 
than  is  provided  by  the  untransformed  data  (X=Xq=!1). 

In  particular,  consider  the  log  transformation,  as  suggested 
by  Figure  2.  The  transformed  data,  as  well  as  the  predicted  values 
and  residuals  obtained  from  fitting  to  the  unconstrained  model 
are  given  in  Table  Id  .  Figure  4a  is  a  plot  of  these  residuals 
versus  their  corresponding  predicted  values,  and  there  is  nothing 
about  this  plot  that  might  indicate  model  inadequacy.  Furthermore, 
when  the  logged  data  is  fitted  to  the  simplified  first-order  model  M^, 
the  resulting  plot  of  residuals  versus  predicted  values,  given  in 
Figure  4b  ,  also  does  not  suggest  any  model  inadequacy. 

Further  confirmation  of  the  benefits  derived  from  using  the 
log  transformation  can  be  given  by  illustrating  that  it  is  no  longer 
necessary  to  make  allowances  for  outliers.  This  can  be  shown  from 
both  the  diagnostic  checking  and  the  robustification  points  of  view. 
With  regard  to  diagnostic  checking,  not  only  does  the  residual  plot 
in  Figure  4 a  fail  to  indicate  any  spurious  observations,  but  also  the 


TABLE  4.  ANALYSIS  OF  VARIANCE  FOR  JOHN' S  2  DATA 


FIGURE  4 


PLOTS  OF  RESIDUALS  VS.  PREDICTED  VALUES  FOR  JOHN'S  2 5 
DATA  (USING  LOG  TRANSFORMATION). 
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diagnostic  check  (11)  Is  given  fay  ga(y)  *  n(h-|-l)  *  32(.49-l)  =  -16.3, 
and  hence  the  negative  value  for  this  checking  function  does  not  dis¬ 
credit  an  assumption  that  a*0.  With  regard  to  robustifi cation, 
consideration  of  Tables  5  and  3b  (which  parallel  Tables  2  and 
3a  for  the  untransformed  data)  shows  that  an  assumption  of  a*0  is  not 
unjustified.  Specifically,  Table  5  gives  the  posterior  means  and 
the  posterior  standard  deviations  for  the  parameters  of  M2  for  various 
fixed  values  of  a  when  fitting  the  logged  data,  and  these  posterior 
means  as  a  whole  are  seen  to  exhibit  somewhat  more  stable  behavior 
throughout  the  range  of  a  considered  than  was  exhibited  in  Table  2 
for  the  untransformed  data.  Furthermore,  the  posterior  probabilities 
for  any  given  yu  being  discrepant  conditional  on  a  fixed  number  of 
outliers  are  given  In  Table  3b  ,  and  although  the  probabilities  cor¬ 
responding  to  y1Q  and  y23  are  somewhat  larger  than  the  other  prob¬ 
abilities  in  both  the  r»l  and  the  r=2  cases,  there  is  no  compelling 
evidence  that  either  of  these  observations  are  outliers. 

We  summarize  the  results  of  our  investigation  of  John's  data 
as  follows: 

(i)  The  application  of  various  diagnostic  checks  indicates  that 
a  standard  analysis  of  these  data  (without  making  allow¬ 
ances  for  model  departures  such  as  the  presence  of  outliers 
or  the  need  for  transformation)  is  inappropriate. 

(11 )  Model  robustification  with  respect  to  the  possibility  of 
outliers  in  the  untransformed  data  shows  the  presence  of 
exactly  one  discrepant  value.  Recalling  that  the  residual 
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POSTERIOR  MEANS  AND  STD.  DEVIATIONS  (IN  PARENTHESES) 
FOR  9 ' s  OF  M2  FOR  JOHN'S  25  DATA  (USING  LOG 
TRANSFORMATION)  FOR  VARIOUS  VALUES  OF  a  WITH  k-5. 
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plot  in  Figure  la  (which  was  considered  during  the  diag¬ 
nostic  checking  of  the  standard  model)  suggested  that  as 
many  as  two  observations  may  be  discrepant,  we  thus  see 
that  the  situation  is  clarified  through  the  robustification 
of  this  model. 

(iii)  An  improved  explanation  of  the  data  can  be  obtained  by 
considering  model  robustification  with  respect  to  the 
possible  need  for  transformation  of  the  response  variable 
(rather  than  with  respect  to  the  possibility  of  outliers), 
and  this  leads  to  an  analysis  based  on  the  logarithmically 
transformed  data. 

5.  A  diagnostic  check  for  transformation  of  the  predictor  variables. 

In  the  example  just  discussed,  the  diagnostic  check  gx(y) 
indicated  the  need  to  consider  a  transformation  of  the  response 
variables  y  in  order  to  obtain  an  adequate  representation  of  the 
relationship  between  y  and  the  input  variables,  given  in  coded  form 
by  the  Xj's. 

In  situations  where  the  k  input  variables  under  consideration  — 
C*(C] ... • »£k) '  in  uncoded  form — are  all  quantitative  (i.e.,  can 
take  on  any  value  over  some  continuous  range),  it  may  be  the  case 
that  a  better  empirical  relationship  between  5  and  y  can  be  developed 
by  dealing  with  transformations  of  some  or  all  of  the  elements  of  5 
rather  than  with  $  itself. 
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In  particular,  consider  the  normal  linear  model  for  which 
p(y|9,o2,X)  is  N„(X^0,a2I),  so  that  for  the  uth  observation 
p(yu|8,a2,X)  is  N(xu^  0,a2),  with  xu^  corresponding  to  the  uth 
row  of  X^.  The  vector  X  =  (X1 .... .X^)' denotes  that  each  xu^ 
depends  on  the  uth  set  of  inputs  Cu  s  (Ciut...»Cku)'  only  through 


(V 


(X)  „ 

(  £  (Xl) 

^1u  . 

X-  xi 

^1U 

xi  t  o. 

1n  *iu 

X1  =  0. 

Note  that  X„  *  L,~0,  where  Xn  =  1.  Thus,  for  example,  the  investi- 
gator  may  wish  to  fit  a  low  order  polynomial  model  in  the  untransformed 
predictor  variables  X  to  the  data,  when  it  may  actually  be  more  appro- 
priate  to  be  using  the  same  order  polynomial  (or  perhaps  even  a  poly¬ 
nomial  of  lower  order)  expressed  in  terms  of  some  suitably  transformed 
predictor  variables  X~ •  For  this  situation  it  would  be  useful  to 
have  available  a  diagnostic  check  which  could  be  used  to  assess  the 
appropriateness  of  using  X  *  XQ  *  1.  Specifically,  the  particular 
checking  function  suggested  by  (l)  can  be  considered  for  each  x^. 

The  derivation  of  this  predictive  check  is  now  given. 

We  assume  that,  for  any  specified  X,  p(0,o2|X)  Is  locally 
uniform  in  0^,...,e  and  In  a.  However,  since  prior  Independence 
between  0  and  X  is  clearly  an  Inappropriate  assumption,  then  following 
the  argument  of  Pallesen  (1977,  Section  2.3.1  ),  we  employ  the  prior 


(25) 


p(0.ffa|X)«|X^V-)|2®2  • 


This  combines  with 


p(y|0,a2,X)  *  (2to2)’5  exp{-^2(y-X^^0)  '(y-X^9)}  (26) 

to  give  p(y,0,a2|X)  which,  upon  integrating  out  9  and  a2, yields 
the  conditional  predictive  distribution 


where 


Hence, 


P(y|X)«(vs2 )  “2  , 
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and  where 


xu  *  x  ^-0*.  R  =  XCX'X)*^',  X-X<*0>. 


A  /S 


>  (35) 


denote  the  appropriate  quantities  for  an  analysis  based  on  assuming 
A  -  Aq  *  1. 


Writing 


!i  = 


,  (BT)  +  _  (P) 
t  i  ri 


(36) 


the  overall  predictive  check  for  the  i 

y'Rz, 


,th 


predictor  variable  ^  is  thus 

(37) 


and  it  has  the  interpretation  of  seeking  a’ correlation  between  the 
residuals  from  fitting  y  to  N  (X0,a2I)  and  the  residuals  from  fitting 
z.j  to  the  same  model.  Strong  positive  correlation  will  suggest  that 
values  of  A.>1  may  be  more  appropriate,  while  strong  negative  correla^ 
tion  will  suggest  that  values  of  A-<1  may  need  to  be  considered. 

In  considering  the  two  components  which  combine  to  form  g.  (y) 

l’  " 

according  to  (30)  ,  it  is  noted  in  particular  that  the  first  compo¬ 
nent  g|®^(y)  can  be  interpreted  in  the  context  of  the  iterative 

i  ' 

procedure  for  the  joint  estimation  of  9  and  A  proposed  by  Box  and 
Tidwell  (1962).  (See  also  Box  and  Draper,  1979.)  The  steps  involved 
in  the  first  iteration  of  this  procedure  are  as  follows  (using. the 
notation  defined  above): 


(i)  Fit  the  model  Nn(X0,a2I)  to  the  data  y. 

(11)  Use  0  from  (1)  to  obtain  the  matrix  Z^BT^  having 

entries  z.^BT^  given  by  (33). 

(iii)  Fit  the  model  Nn(X0+Z^B^(X-1),o2I). 

In  the  particular  case  where  transformation  of  only  the  ith 

predictor  variable  C-j  is  contemplated,  step  (iii)  above  reduces 

to  fitting  the  model  Nn(X0+z. ^BT^(X^-l),c2I),  and  the  relevance 

of  g,  (y)  as  a  diagnostic  checking  function  in  this  context  is 
Ai  - 

readily  seen. 


It  is  interesting  to  note  that  the  estimated  0,  which  is 
calculated  using  an  assumption  that  X  =  Xq  =  1  is  used  in  step 
(ii)  above  as  a  matter  of  convenience,  according  to  the  authors; 


whereas  explicit  consideration  of  the  nature  of  0^  with  respect 

to  X  is  provided  through,  the  presence  of  the  second  term, 

(Pi 

'(y)»  in  the  function  g,  (y).  It  can  thus  be  expected  that 
i  '  Ai  ** 

diagnostic  checking  based  on  g^  (y)  will  be  more  sensitive  to 

the  need  for  a  power  transformation  of  than  diagnostic  checking 

based  on  the  results  of  the  first  iteration  of  the  Box-Tidwell 

procedure  (and  hence  based  on  g^T^(y)). 

Ai  - 

6.  Summary 

The  question  of  what  tentatively  to  include  in  a  model  and 
what  tentatively  to  omit  from  a  model  merits  careful  consideration 
by  the  investigator.  The  answers, of  course,  will  depend  upon  the 
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nature  of  the  phenomenon  that  the  investigator  is  attempting  to 
model.  What  is  clear,  however,  is  that  there  is  a  need  for  the 
investigator  to  have  for  his  use  not  only  (i)  well  developed 
techniques  which  allow  for  model  elaboration  with  respect  to 
those  aspects  of  model  departure  that  he  fears  the  most  and/or 
are  the  most  likely  to  occur,  but  also  (ii)  suitable  diagnostic 
checks  which  allow  for  model  criticism  with  respect  to  those 
aspects  not  explicitly  provided  for  in  the  model.  The  techniques 
developed  and  discussed  in  this  paper  are  of  use  in  filling 
these  needs. 

7.  Two  final  examples. 

In  the  introduction  of  this  paper  ,  reference  was  made  to 
the  suggestion  of  Chen  and  Box  (1979a, b)  that  a  contaminated 
normal  distribution  will  often  be  a  reasonable  choice  of  an  error 
distribution  when  modeling  data  from  a  well  planned  experiment.  To 
illustrate  the  practical  application  of  this  suggestion,  a  side- 
by-side  comparison  of  two  data  sets  analyzed  from  this  viewpoint 
is  presented  in  this  section. 

Both  sets  of  n=27  observations  were  from  experiments  using  the  same 
design.  The  design,  displayed  in  Table  6a  ,  can  be  used  in  fitting 
a  second  degree  polynomial  in  four  factors,  denoted  by 
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where,  as  indicated  above,  the  eu's  will  be  assumed  to  follow  a 
contaminated  normal  distribution.  (Specifically,  the  type  of 
contamination  used  in  the  Section  4  example  will  also  be  used 
here,  so  that  each  observation  has  a  probability  a  of  being 
contaminated  in  that  it  has  a  standard  deviation  which  is  k=5 
times  as  large  as  the  standard  deviation  of  an  uncontaminated 
observation.)  This  design  can  be  divided  into  three  orthogonal 
blocks,  which  explains  the  presence  of  the  linear  and  quadratic 
blocking  variables  xLu  and  Xqu  in  (38). 

The  two  sets  of  data  considered  are  those  of  Bacon  (1970) 
and  of  Box  and  Behnken  (1960).  (The  latter  reference  is,  in  fact, 
the  paper  in  which  the  design  in  Table  6a  was  introduced.) 

These  data  are  given  in  Tables  6b  and  6c  ,  along  with  the 
corresponding  predicted  values  and  residuals  from  fitting  (38) 
to  each  set  of  data  under  the  assumption  of  no  contamination 
(a=0).  Actually,  for  the  Bacon  data,  a  simpler  model  than  (38) 
was  found  to  be  aoequate,  in  that  terms  involving  x^  and  the  two 
clocking  terms  were  omitted  from  the  model. 

Plots  of  the  residuals  versus  the  predicted  values  are 
shown  in  Figure  5.  For  the  Bacon  data  (Figure  5a  )  no  unusual 
behavior  is  noticed.  However,  for  the  Box-Behnken  data  (Figure  5b), 
the  possibility  of  two  bad  values  (y^Q  and  y^)  is  suggested.  It 
is  of  interest  to  explore  how  the  analysis  of  each  of  these  sets 
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FIGURE  5. 
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PLOTS  OF  RESIDUALS  VS.  PREDICTED  VALUES  FOR  DATA  FROM 
BOX-BEHNKEN  DESIGN. 
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of  data  is  affected  when  the  possibility  of  discrepant  observa¬ 
tions  is  allowed  for  by  choosing  afO  and  using  the  techniques 
developed  by  Box  and  Tiao  (1968). 

Considering  the  Bacon  data  first,  the  posterior  means  and 
standard  deviations  for  the  p=10  parameters  used  in  the  model 
are  shown  in  Table  7  for  various  choices  of  a,  and  these 
quantities  are  seen  to  be  remarkably  stable  over  the  range  of 
a  studied.  Furthermore,  the  posterior  probabilities  for  any 
specified  yu  being  discrepant  conditional  on  a  fixed  number  of 
outliers  r,  which  are  given  in  Table  8a  for  the  cases  r=l  and 
r=2,  do  not  indicate  the  presence  of  any  bad  observations. 

Thus  the  conclusion  arrived  at  by  visual  inspection  of 
the  residual  plot  in  Figure  5a  —  namely,  that  there  are  no 
discrepant  observations  in  Bacon’s  data  —  is  further  supported, 
from  the  robustification  point  of  view,  by  an  analysis  using  a 
model  which  takes  Into  account  the  possibility  of  bad  observations. 

Of  course,  other  model  criticism  techniques  should  be 
employed  to  explore  other  areas  in  which  the  model  may  be 
improved  upon.  For  example,  consider  the  analysis  of  variance 
displayed  in  Table  9a.  In  this  table,  the  residual  sum  of 
squares  obtained  from  fitting  to  the  data  a  full  second  order 
polynomial  model  in  all  four  factors  (including  x^)  is  partitioned 
in  such  a  way  so  as  to  provide  diagnostic  checks  not  only  for  the 
need  to  account  for  block  to  block  variation,  but  also  for  the 
possibility  that  a  third  or  higher  order  polynomial  model  might 
be  needed  to  adequately  approximate  the  underlying  response  surface. 


Specifically  ,  letting  0..^,  and  j -j  denote  coefficients 

3  2 

of  ,  x-Xj  ,  and  x^.x^x^,  respectively,  in  a  third  order  polynomial 
representation;  we  have  the  following  properties  of  this  design: 

(i)  The  design  does  not  allow  for  the  separate  estimation  of  0^  and 

3 

eiii ,  since  x^  =  x^  for  all  design  points. 

(ii)  It  does  not  allow  for  the  estimation  of  6^,  since  x^x^x^  =  0  for 
all  design  points. 

(iii)  It  does  not  allow  for  the  separate  estimation  of  the  B^-'s,  but 
does  allow  for  the  estimation  of  certain  contrasts  of  the  0...'s 

'  J  J 

independent  of  the  estimation  of  coefficients  in  the  second 
order  model . 


TABLE  7  POSTERIOR  MEANS  AND  STD.  DEVIATIONS  (IN  PARENTHESES) 

FOR  0'S  OF  MODEL  FOR  BACON'S  DATA,  FOR  VARIOUS  VALUES 
OF  a  WITH  k  =  5 . 
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TABLE  8.  CONDITIONAL  POSTERIOR  PROBABILITY  THAT  yu  IS 

DISCREPANT  (I.E.,  HAS  VARIANCE  INFLATED  BY  A  FACTOR 
k2  =  25)  GIVEN  THAT  EXACTLY  r  OBSERVATIONS  ARE 
DISCREPANT:  DATA  FROM  FOUR  FACTOR  BOX- BEHNXEN  DESIGN. 
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V^U 
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With  regard  to  (iii),  each  sum  of  squares  denoted  by  in 
Table  9a  is  associated  with  all  linear  contrasts  of  the  0...  estimates 

1  J  J 

for  j  f  i.  The  remaining  sum  of  squares  at  the  bottom  of  the  table 
is  associated  with  coefficients  of  order  higher  than  three.  (A 
similar  type  of  partitioning  has  been  previously  explored  by  Draper 
and  Herzberg,  1971,  for  the  central  composite  response  surface  designs 
for  some  related  work  involving  single  degree  of  freedom  contrasts, 
see  Box,  Hunter  and  Hunter,  1978,  and  Box  and  Draper,  1979.)  The 
somewhat  large  sum  of  squares  given  for  L-|  in  the  table  suggests 
possible  lack  of  fit  of  a  second  order  polynomial  due  to  contrasts 
of  9^22’  ei 33 »  an<*  ei44  significant  non-zero  magnitude.  Practi¬ 
cally  speaking,  this  suggests  that  a  closer  approximation  to  the 
true  surface  may  be  gained  by  transforming  one  or  more  of  the  four 
factors  before  fitting  the  second  order  polynomial  (Box  and  Draper, 
1979),  but  this  possibility  will  not  be  explored  in  further  detail 
here. 

Turning  now  to  the  Box-Behnken  data  (Table  6c),  Table  10  gives 
the  posterior  means  and  standard  deviations  for  the  p  =  17  parameters 
in  the  model  (38).  It  is  seen  that  many  of  these  quantities  undergo 
a  noticeable  change  when  the  contamination  frequency  parameter  a  is 
allowed  to  assume  non-zero  values.  The  most  dramatic  of  the  changes 
in  the  expectations  is  observed  in  the  posterior  mean  for  6^,  where 
the  contemplation  of  even  a  small  non-zero  value  for  a  switches  the 
sign  of  E(014|y,a).  Moreover,  for  all  parameters  except  014,  the 


TABLE  10.  POSTERIOR  MEANS  AND  STD.  DEVIATIONS  (IN  PARENTHESES) 
FOR  6's  OF  MODEL  FOR  THE  BOX-BEHNKEN  DATA,  FOR 
VARIOUS  VALUES  OF  a  WITH  k=5. 
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corresponding  posterior  standard  deviations  are  about  halved,  which 
effectively  increases  the  size  and  sensitivity  of  the  experiment  by 
a  factor  of  four. 


Note,  however,  that  from  Table  8b  the  posterior  probabilities 
for  yu  being  bad,  given  that  exactly  r  values  are  bad,  suggest  that 
there  is  only  one  bad  observation  and  not  two,  as  first  suggested 
by  the  Figure  5b  residual  plot.  Specifically,  for  r  =  1,  y-|Q  receives 
94%  weight  as  compared  to  6%  weight  for  y^l  while  the  weights  for 
r  =  2  do  not  strongly  indicate  that  both  of  these  observations  are 
outliers.  Finally,  in  considering  the  analysis  of  variance  in 
Table  9b,  it  is  seen  that  the  sums  of  squares  corresponding  to  both 
L|  and  l_4  are  large.  This  is,  of  course,  what  would  be  expected  if 
it  were  known  that  any  one  of  observations  10-13  was  an  outlier, 
since  the  effect  that  the  bad  observation  would  have  on  the  posterior 
distribution  of  0-|4  when  a  /  0  is  allowed  would  be  "hidden"  in  the 
e-|-|4  and  0144  third  order  bias  coefficients  under  the  restrictive 
a  =  0  model. 

Computational  Footnote. 

The  present  speed  and  capacity  of  computers  make  it  feasible 
to  utilize  model  robustification  techniques,  such  as  those  illustrated 
in  these  two  examples,  as  well  as  in  the  example  of  section  4, 

5 

involving  John's  2  data,  to  guard  against  feared  model  discrepancies. 
However,  much  work  needs  to  be  done  in  the  area  of  developing  general 
programs  which  will  actually  do  these  robust  analyses  and  thus  take 
advantage  of  the  existing  computer  capabilities. 
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